This is a very very WIP post. It has only EPL comparisons right now and I’m adding data from other leagues to get more generic results.

Motivation

Formations and player roles are a very fuzzy concept. We simplify things into 4-2-3-1s and 4-3-3s and right backs and centre forwards but within those simplifications there are many nuances to how different teams and different players function.

In this post, I try to quantify these and find teams who are set up similarly in various matches.

Clustering team-matches

The teams are clustered based on the max distance between players.

Each node at the end of this graph is a particular team playing in a particular match.

Cluster descriptions:

  • Cluster 1 is 3-4-2-1

  • Cluster 2 is 3-4-2-1 with a mix of various other 3 at the back formations

  • Cluster 3 is 4-3-3

  • Cluster 4 to 7 are a mix of various 4 at the back formations with a strong 4-2-3-1 element in all of them.

  • Cluster 8 seems to not have an underlying link with the formation.

Cluster Examples

I’ve picked a match from each of the clusters, and overlaid that match with data from all the player pairings with that match in that cluster. That gives a rough idea of how the teams played in that cluster.

I’ve also added some comparisons between clusters which feel slightly similar.

Cluster 1

Cluster Example

Comparisons

Clusters 1 and 2

Clusters 1 and 3

Cluster 2

Cluster Example

To Do

I plan to add data from La Liga, Bundesliga, Seria A, Ligue 1, and the Championship from 2017/18. A bigger set of matches with more diverse strategies might lead to different, and more general clusters?